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1. GENERAL REMARKS 

Let K be a reversible Markov kernel on a mea- 
surable space (S,B) with stationary distribution P. 
Regard K as a linear operator, K : L,2(P) — > L,2(P), 
and suppose that L2{P) admits an orthonormal ba- 
sis of (real) eigenfunctions yojVij--- f° r Thus, 



Kipj(s) = J tpj(t)K(s,dt) =(3j(pj(s), 

8€S,j = l,2,..., 

for some (real) eigenvalue (3j . Under mild additional 
conditions, 

(1) 4\\K e (s,-)-P\\ 2 <Y,f 3 ?rf( s ) for all SG 5, 

j>0 

where || • || is total variation norm and the £th 
iterate of K. Using (1) is quite natural in MCMC 
where information on the convergence rate is crucial. 
For the 2-component Gibbs sampler, however, one 
drawback is that K is generally not reversible. 

Diaconis, Khare and Saloff Coste (DKS, in the se- 
quel) go through this problem by noting that the 
marginal chains (the x-chain and the #-chain) are 
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reversible, and bounding the marginal chains yields 
bounds on the bivariate chain. More importantly, 
in a few examples, DKS are able to diagonalize the 
marginal kernels, that is to evaluate their eigenval- 
ues and eigenfunctions. A basic fact is that, in such 
examples, the eigenfunctions agree with the orthog- 
onal polynomials corresponding to the marginals of 
P. 

Following this route, DKS give explicit sharp es- 
timates, both lower and upper, on the convergence 
rate of a 2-component Gibbs sampler. Their results 
are interesting, elegant and promising of some gener- 
alizations. On the other hand, since an explicit diag- 
onalization is required, they cover a few particular 
cases only. In real problems, when sampling from 
P, the available information is usually not enough 
for a diagonalization. Moreover, it is not clear how 
to handle the fe-component Gibbs sampler for k > 2 
using DKS's argument. Thus, in addition to DKS's 
bounds, it could be useful to have other estimates of 
the convergence rate, possibly less sharp but with a 
broader scope. 

Here, we adopt the latter point of view and look 
for estimates based on classical drift conditions. In 
a sense, we investigate the extent of DKS's words 
in Section 1: "Finding useful V and q is currently a 
matter of art" (where V and q are the ingredients of 
a drift condition). We will play the devil's advocate, 
of course. 

2. PLAIN ERGODICITY 

As far as possible, our notation agrees with DKS's. 
Thus, (X, J 7 ) and (@,Q) are measurable spaces, with 
T and Q countably generated, and P is a probability 
measure on the product cr-field T ' ®Q. We let 

X:X xQ^X and 1:^x8^6 

denote the canonical projections. It is assumed that 
P has a density / with respect to fj, x 7r, where \jl 
is a o"-finite measure on T and ir = P o T~ l is the 
prior. Also, m{x) = J f(x,6)ir(d6) is the density of 
P o A" 1 with respect to \i. As DKS, we assume < 
m{x) < oo for all x E X. 
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We always refer to the Gibbs sampler with kernel 

j(M),c) 

= J J Ic(a, b)f(x, b)f(a, b)p(da)n(db) 

where (x, 6) G X x and C £ T ®Q . Loosely speak- 
ing, this is the version of the Gibbs sampler where 
the initial state (x,&) is first updated into (x,b) and 
then into (a, b). Abusing notation, since J only de- 
pends on x, we write J(x,-) instead of J((x, #),-)• 
Note that DKS denote our J by K. 

A first point to be settled, before discussing rates 
of convergence, is ergodicity. Indeed, for Gibbs sam- 
pling to make sense, J should be ergodic, in the sense 
that 

|| J*(x, ■) - P\\ — >0 for all x G X as oo. 

A simple equivalent condition is in Berti, Pratelli 
and Rigo (2008, Theorem 4.5). Letting N = {C G 
T ' ®Q: P(C) = 0}, J is ergodic if and only if 



(2) 



a{X) n a{T) =N 



where a(X) = cr(a(X) U N) and a(T) = a(a(T) U 
M). A more transparent version of (2) is 

P(X£A) = or P(TgP) = 

whenever A G T , B G Q and 

P(A x B) = P(A C x B c ) = 0. 

Moreover, a working sufficient condition for (2) is 

{x eA}n{TeB}c{f>o} 

C {X G A} U {T G B} 



(3) 



for some A £ J 7 , B G Q with P(A x B) > 0; see 
Berti, Pratelli and Rigo (2008), Corollary 3.7. 

3. UNIFORM ERGODICITY 

Let K be a Markov kernel on (S, B) with station- 
ary distribution P. If K(s, ■) > eQ(-), s G 5, for some 
e > and probability Q on B, then ||i^ £ (s, •) — P|| < 
(1 — e) £ , s G S. Coming back to the Gibbs sampler, 
this fact implies: 

Proposition 1. If m is bounded, then 



for all x G X 



\J*(x,-)-P\\< 1 



supra 



where u = sup ir(B) inf f(x,9). 

B^g xGX,9eB 



Proof. This is essentially Remark 4.6 of 
Berti, Pratelli and Rigo (2008). For definiteness, we 
repeat the calculations here. Let (S, B) = (X x 6, ^"(8> 
Q), K = J and u(B) = ir(B) inixxB f ■ It can be as- 
sumed u(B) > for some B G Q (otherwise, u = 
and the Proposition 1 holds trivially). Fix one such 
B and define e = u(B)/ supm and Q(-) = P(- \ T G 
B). Then, 

J(x,C) > J(x, C n{T £ B}) 

f I c (a,b)I B (b)f(x,b)f(a,b) 
• fi(da)ir(db) 
^P(CD{TeB}) = eQ(C) 



m{x) 

SUp ?7l 



for all x G X and C G J- (8> Q ■ Since P is stationary 
for J, it follows that 



|J'(x,-)-P||< 1 



u(B) 



for all x G X. 



supm, 

Taking sup over B concludes the proof. □ 

By Proposition 1, if m is bounded and u > then 
J is uniformly ergodic, in the sense that ||J (x, ■) — 
P\\ < qp l , x G X, for some constants q and p G (0, 1) 



(here, q = 1 and p = 1 



sup m > 



To fix ideas, this 



happens in case X is compact, a Polish space, m 
bounded, and / strictly positive and continuous. An 
example of DKS falls in this class. 

Example 4.1.1 (Beta/Binomial). Let ir be 
uniform, so that m(x) = l/(n + 1) for all x G X = 
{0,1,..., n}. Taking sup over those B of the form 
B = [S, 1 - S\, < S < 1/2, yields u > ^(^y)". 

Thus, Proposition 1 gives || J (x, •)— P|| < p for all x 
with 



p = l 



n 



2(n+l) 



Instead, DKS obtain bounds for x = n only; see 
Proposition 1.1. More precisely, 

I/3Bll^r)-P||<^^/?f 

2 

where (3± = 1 



n+ 2 

Hence, DKS's estimate of the convergence rate, that 
is /?i , is (much) better than our p for large values of 
n. 
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4. GEOMETRIC ERGODICITY 

We first recall a general result on Markov chains. 

Theorem 2 [Rosenthal (1995)]. Let K be an er- 
godic Markov kernel on (S,B) with stationary dis- 
tribution P. Suppose 



(4) 



Kg(s)<a + Pg(s), s£S, 



for some measurable function g: S — > M + and con- 
stants a and (3 G (0, 1). Fix d > 2a/ (1 — (3), define 
D = {s £ S : g(s) < d} and suppose also that 



(5) 



K(s,-)>eQ(-), s £ D, 



for some e > and probability Q on B. Then, for all 
r £ (0, 1) and s £ S, 



\K t (8,-)-P\\<(l-e) n + 1*[l + 



a 



where t 



l-f3 

1 + 2q + 2f3d) r (l + 2a + /3d) 1_r 

(i + dy- r ' 



In a Gibbs sampling framework, Theorem 2 turns 
into: 

Proposition 3. Suppose condition (2) holds and 

(6) J(/)(x)<a + /3(/)(x), x£X, 

for some measurable function (j>:X—> M + and con- 
stants a and (3 G (0,1). Fix d > 2a/ (1 — (3), define 
A = {x G X : <j)(x) < d} and suppose also that 

supm < oo and inf / > 

a AxB 

(7) 

for some B£Q with P(A x B) > 0. 
Then, for all r G (0, 1) and x G X , 
\\J\x,-)-P\\ 

<(i- e y i + t i (i + 



l-f3 



+ </> (x 



with t as in Theorem 2 and e = ^^l'p^m 5 ~ ■ 

PROOF. By (2), J is ergodic. By (6), condition 
(4) holds with K = J and g(x,6) = <f>(x). By (7), 
there is B G Q with inf^xB / > and ir(B) > P{A x 
B) > 0. Since sup^m < oo, the same calculation as 
in the proof of Proposition 1 yields 

J(x,C)>^ )infW P(C|TGi?) 
sup^m 

for all x £ A and C £ T ® Q. 



Thus, (5) holds with e = w{B ^[ A ^ B f and Q(-) = 
P(- \T £ B). An application of Theorem 2 concludes 
the proof. □ 

Proposition 3 applies to most DKS's examples 
providing reasonable estimates. Note that: (i) Con- 
dition (2) holds (in fact, (3) holds) in such examples. 

(ii) If (7) holds for all d, then t can be made arbi- 
trarily close to (3 for suitable r, d. There is a trade- 
off, however, since the choice of r, d affects (1 — e) r£ . 

(iii) Letting tp = 1 + a/(l — f3) + <j>, one has 

t l ijj(x) < e~ c whenever I > {c + log ip(x)}/\ log t\ 

for all x £ X and c > 0. This can serve to estimate 
the impact of the initial state x. It is roughly of the 
same order of some DKS's estimates. 

Example 4.2.1 (Poisson/Gamma). Let it be 
standard exponential, so that m(x) = 2~ x ~ 1 for x £ 
X = {0, 1, . . .}. We take <j){x) = x. In that case, the 
set A = {(f) < d} meets condition (7) for all d > 0. As 
to (6), it suffices noting that 



J(j)(x) 



1 



m(x) 



2 X 



x+l 



a/(a, b)fi(da)f(x, b)ir(db) 



bf(x,b)e~ b db 



X\ Jo 2 

Hence, Proposition 3 applies with a = (3 =1/2. Now, 
acting on r, d, upper bounds on the convergence rate 
can be easily obtained. At this stage, using numeri- 
cal evaluations is convenient. 

Example 4.3 (Gaussian). Suppose a 2 + r 2 = 
1/2 and tt is N(0, r 2 ), so that the posterior distribu- 
tion 7r(- I x) is N(2r 2 x, 2t 2 <t 2 ). We take 4>{x) = \x\. 
Again, A = {<p < d} meets (7) for all d > 0. Recalling 
E\N(0, 1)| = x/2/tt, one obtains 



J(f>{x) 



\a\f(a, b) daTr(db \ x) 



< 



+ aJ2/7r}7r(db \ x) 



< o-^2/tt + V2aT^2/ir + 2t 2 \x\ 
= a + 2t 2 \x\, 

say. Since 2r 2 < 2(cr 2 + r 2 ) = 1, condition (6) holds 
with f3 = 2r 2 . Again, acting on r,d, one gets esti- 
mates (even if non optimal) of the convergence rate. 
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5. HIGHER COMPONENT PROBLEMS AND 
CONCLUDING REMARKS 

Apparently, DKS's argument does not apply to 
the fc-component Gibbs sampler when k > 2. On the 
other hand, Propositions 1 and 3 can be adapted to 
any value of k. We illustrate this point with regard 
to Proposition 1 for k = 3. To this end, notation 
needs to be updated. Suppose is the product 

of two measurable spaces (Xi,Fi), (Xi and P 
has a density / with respect to \x\ x ^2 X where 
Hi is a cr-finite measure on i = 1, 2. The marginal 
densities of the pairs x = (x\,X2), (xi,0) and (x2,0) 
are assumed finite and strictly positive everywhere. 
Also, h denotes the density of (x±,6). Then, Propo- 
sition 1 takes the form: 

Proposition 4. Let J be the Markov kernel of 
the ^-component Gibbs sampler. If m and h are 
bounded, then 

\\J e {x,-)-P\\<(l —) forallxeX 

\ sup my 

h iv \ fT yA^xex,e&Bf(x,e)} 2 

where v = fJ,2{X2) swpniB) '■ — — . 

Beg suPxieXifieBh&uO) 

Incidentally, we note that ^2(^2) < 00 whenever 
inf^xB / > for some B G Q with n(B) > 0. 

Next, we would like to draw the Authors' atten- 
tion to an issue that might potentially enlarge the 
scope of their argument. Consonni and Veronese 
(2001) introduced the concept of conditionally re- 
ducible natural exponential families. Basically, they 



are multivariate natural exponential families whose 
densities can be expressed as a product of lower 
dimensional (possibly univariate) conditional expo- 
nential families, each being indexed by its own nat- 
ural parameter. The underlying idea is intimately 
related to that of a cut. Examples include the multi- 
nomial and Wishart sampling families. We wonder 
whether the methods described by the Authors could 
be applied recursively to conditionally reducible fam- 
ilies admitting a factorization in terms of univariate 
exponential families, such as the multinomial family. 

To sum up, DKS's estimates behave excellently, 
indeed very close to optimum, in those examples for 
which they are thought. One further merit is that 
lower bounds are provided as well. On the other 
hand, Propositions 1 and 3, presented in this dis- 
cussion, have a broader scope, can be applied for 
any initial state x (while DKS's bounds are some- 
times available for certain x only), but can provide 
less sharp bounds. 
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